Molecular Ecology Resources — Latest Matching Preprints

1

Accuracy of occurrence and abundance estimates from insect metabarcoding

Iwaszkiewicz-Eggebrecht, E.; Granqvist, E.; Nowak, K. H.; Valdivia, C.; Buczek, M.; Srivathsan, A.; Hartop, E.; Miraldo, A.; Roslin, T.; Tack, A. J. M.; Lukasik, P.; Meier, R.; Ronquist, F.

2026-02-22 zoology 10.64898/2026.02.20.707016 medRxiv

Top 0.1%

90.5%

Show abstract

1. DNA metabarcoding--high-throughput sequencing of barcode regions from bulk samples--has become a key tool for insect biodiversity assessment. Yet, how methodological choices affect the accuracy of metabarcoding data remains insufficiently explored. In this paper, we ask: (1) How does the lysis method (non-destructive lysis vs. destructive homogenization) affect community recovery? (2) How comprehensively does metabarcoding capture species richness? (3) To what extent can spike-ins improve abundance estimates? (4) How accurately can species abundances be estimated? 2. We evaluated the accuracy of insect metabarcoding using 4,749 bulk samples from a large-scale biodiversity survey subjected to mild lysis. Of these samples, 856 were also homogenized, allowing a systematic comparison of the effect of alternative treatments. To potentially improve abundance estimates, we added six biological spike-ins (i.e., foreign insects) to all samples, and two synthetic spike-ins (artificial DNA fragments) to the homogenization treatment. In addition, we established the contents of 15 samples by individually barcoding all specimens, enabling direct assessment of occurrence and abundance estimates. 3. Our results revealed consistent differences between destructive and non-destructive treatments. While both methods reliably detected the majority of species, small and soft-bodied taxa were more often recovered after mild lysis than after homogenization, while the reverse was true for heavily sclerotized, hairy, and large taxa. Using biological spike-ins for calibration reduced the variance in read numbers per specimen considerably, especially in homogenized samples, while synthetic spike-ins were less effective. In a Bayesian analysis, where species data were matched to the best-fitting spike-in calibration curve, accurate abundance estimates (+/-1 individual) were obtained for 72.9% of species occurrences. 4. Our results show that it is possible to obtain reasonably accurate abundance estimates from metabarcoding data, and that mild lysis and homogenization result in different taxon-specific biases in terms of occurrence data, with neither method outperforming the other. Accuracy is improved by homogenization rather than mild lysis of samples, and by the use of biological rather than synthetic spike-ins. Together, these findings provide a major step towards robust, quantitative biodiversity monitoring using DNA-metabarcoding.

2

Estimating Organism Abundance Using Within-Sample Haplotype Frequencies of eDNA Metabarcoding Data

Brandao-Dias, P. F.; Guri, G.; Shaffer, M.; Allan, E. A.; Kelly, R. P.

2025-07-04 molecular biology 10.1101/2025.06.30.662414 medRxiv

Top 0.1%

79.3%

Show abstract

Environmental DNA (eDNA) metabarcoding provides powerful insights into species presence and community composition, but remains limited in its ability to quantify species abundance or structure. Here, we show that deviation between observed haplotype frequencies within a given sample and the population haplotype frequencies can be used to infer the number of individual contributors to an eDNA sample. We also lay out the theory for how population haplotype frequencies can be approximated from eDNA data alone, enabling broad applicability even in the absence of tissue-based references. We then present an estimator to derive the number of individual contributors to a given eDNA sample and validate its performance using simulations with variable allele frequencies and noise. Our framework demonstrates that differences between expected and observed frequencies carry meaningful biological information in eDNA data. Our results show that the number of contributors can be recovered under a range of conditions, particularly with hypervariable markers and sufficient sampling. This approach complements existing molecular methods and opens a new avenue for inferring abundance from eDNA metabarcoding datasets.

3

SPrUCE: Utilizing Ultraconserved Elements of DNA for Population-Level Genetic Diversity Estimation

Melendez, D.; Sapci, A. O. B.; Bafna, V.; Mirarab, S.

2025-11-16 genomics 10.1101/2025.11.14.688492 medRxiv

Top 0.1%

78.9%

Show abstract

Ultraconserved elements (UCEs) provide ideal candidates for targeted sequencing and cost-effective acquisition of genome-wide data. While UCEs have been widely used in phylogenetic studies to recon-struct evolutionary relationships, their use in population-level research has been limited. This limited application stems from uncertainty over whether UCEs can capture the levels of genetic variation needed to answer population genomic questions central to ecology and biodiversity research. The concern is that, by definition, UCEs are highly conserved and may therefore lack sufficient within-species variation. The more variable flanking regions (400-750 bp from the UCE core) contain informative polymorphisms, though diversity decreases near the core. Thus, any naive estimator of genetic diversity that ignores this conservation will have an underestimation bias. In this paper, we introduce SPrUCE: Sigmoid Pi requiring UCEs, a reference-free method that estimates nucleotide diversity{pi} from aligned UCE data. SPrUCE corrects underestimation bias by modeling the change in diversity away from the UCE core using a Gompertz function. The model accounts for the bias introduced by the conserved core and allows for more accurate per-site diversity estimates. We tested SPrUCE on UCE alignments from a range of taxa, including invertebrates and vertebrates (finches, honeybees, sheep, and smelt). SPrUCE produces diversity values consistent with whole-genome derived estimates that require an assembled reference. It is fast, scalable, and effective even with missing data. Its modeling approach enables accurate population-level assessments of genetic diversity, offering a new and reliable option for conservation and population genetics.

4

Optimization of ddRAD-like data leads to high quality sets of reduced representation single copy orthologs (R2SCOs) in a sea turtle multi-species analysis.

Driller, M.; Vilaca, S. T.; Arantes, L. S.; Carrasco-Valenzuela, T.; Heeger, F.; Chevallier, D.; de Thoisy, B.; Mazzoni, C. J.

2020-04-05 evolutionary biology 10.1101/2020.04.03.024331 medRxiv

Top 0.1%

70.7%

Show abstract

Reduced representation libraries (RRS) allow large scale studies on non-model species to be performed without the need for a reference genome, by building a pseudo-reference locus catalog directly from the data. However, using closely-related high-quality genomes can help maximize nucleotide variation identified from RRS libraries. While chromosome-level genomes remain unavailable for most species, researchers can still invest in building high-quality and project-specific de novo locus catalogs. Among methods that use restriction enzymes (RADSeq), those including fragment size selection to help obtain the desired number of loci - such as double-digest RAD (ddRAD) - are highly flexible but can present important technical issues. Inconsistent size selection reproducibility across libraries and variable coverage across fragment lengths can affect genotyping confidence, number of identified single nucleotide polymorphisms (SNPs), and quality and completeness of the de novo reference catalog. We have developed a strategy to optimize locus catalog building from ddRAD-like data by sequencing overlapping reads that recreate original fragments and add information about coverage per fragment size. Further in silico size selection and digestion steps limit the filtered dataset to well-covered sets of loci and identity thresholds are estimated based on sequence pairwise comparisons. We have developed a full workflow that identifies a set of reduced-representation single-copy orthologs (R2SCOs) for any given species and that includes estimating and evaluating allelic variation in comparison with SNP calling results. We also show how to use our concept in an established RADSeq pipeline - Stacks - and confirm that our approach increases average coverage and number of SNPs called per locus in the final catalog. We have demonstrated our full workflow using newly generated data from five sea turtle species and provided further proof-of-principle using published hybrid sea turtle and primate datasets. Finally, we showed that a project-specific set of R2SCOs perform better than a draft genome as a reference.

5

Improved target capture with lower hybridization temperatures for invertebrate loci with different baiting strategies: a case study of the leaf-footed bugs and allies (Hemiptera: Coreoidea)

Forthman, M.; Gordon, E. R. L.; Kimball, R. T.

2022-03-04 genomics 10.1101/2022.03.02.482542 medRxiv

Top 0.1%

68.8%

Show abstract

Target capture approaches are widely used in phylogenomic studies, yet only four experimental comparisons of a critical parameter, hybridization temperature, have been published. These studies provide conflicting conclusions regarding the benefits of lower temperatures during target capture, and none include invertebrates where bait-target divergences may be higher than seen in vertebrate capture studies. Most capture studies use a fixed hybridization temperature of 65{degrees}C to maximize the proportion of on-target data, but many invertebrate capture studies report low locus recovery. Lower hybridization temperatures, which might improve locus recovery, are not commonly employed in invertebrate capture studies. We used leaf-footed bugs and relatives (Hemiptera: Coreoidea) to investigate the effect of hybridization temperature on capture success of ultraconserved elements (UCE) targeted by previously published baits derived from divergent hemipteran genomes and other loci targeted by newly designed baits derived from less divergent coreoid transcriptomes. We found touchdown capture approaches with lower hybridization temperatures generally resulted in lower proportions of on-target reads and lower read depth but were associated with more contigs and improved recovery of UCE loci. Low temperatures were also associated with increased numbers of putative paralogs of UCE loci. Hybridization temperatures did not generally affect recovery of newly targeted loci, which we attributed to their lower bait-target divergences (compared to higher divergences between UCE baits and targets) and greater bait tiling density. Thus, optimizing in vitro target capture conditions to accommodate low hybridization temperatures can provide a cost-effective, widely applicable solution to improve recovery of protein-coding loci in invertebrates.

6

From fluke to fragment: a multifaceted method for molecular sex identification and mitochondrial haplotyping from environmental DNA samples

Rodriguez, L. K.; Schallhart, S.; Hobmeier, P.; Curran, T.; Perez-Jorge, S.; Prieto, R.; Oliveira, C.; Silva, M. A.; Thalinger, B.

2026-05-04 genomics 10.64898/2026.04.30.719183 medRxiv

Top 0.1%

65.0%

Show abstract

O_LIEnvironmental DNA (eDNA) analyses have become a powerful tool for non-invasive biodiversity monitoring, yet the applicability of population genetic approaches to environmental samples remains largely unexplored. Even when genetic traces originate from a single individual, low target DNA concentrations and amplification or sequencing artefacts can compromise downstream genetic inferences. Here, we present a novel approach for obtaining demographic insights and lineage-level mitogenomic information from aquatic eDNA samples collected near vertebrate individuals. C_LIO_LIPaired eDNA and tissue samples were collected during sperm whale (Physeter macrocephalus) encounters in the Azores. Samples were screened for the presence of vertebrate eDNA and analyzed with a novel molecular sex identification assay. Additionally, long-range PCR was used to amplify up to five mitochondrial DNA fragments ([~]3-4k bp) before subsequent sequencing on an Oxford Nanopore Technologies platform. A stringent three-tier filtering framework capable of identifying true mitogenomic variation across eDNA samples was developed for maximum recovery of genetic diversity at the haplogroup level. By benchmarking eDNA samples via their paired tissues, parameter values were optimized to maximize concordance and minimize spurious variant calls. C_LIO_LISexing was successful for 50% of eDNA samples, with 96% concordance to paired tissues, and marine vertebrate DNA concentration significantly predicted sexing success. Further, Medaka polishing produced high identity mitochondrial consensus sequences (>16 kb) from eDNA samples. Across filtering regimes in the framework, curated SNP panels comprising up to 453 high-confidence mitochondrial SNPs resolved 19 haplogroups, with 93% concordance between eDNA and tissue samples. An intermediate bioinformatics filtering strategy maximized biologically accurate haplogroup recovery while minimizing sequencing artefacts, providing the most reliable lineage-level inferences. C_LIO_LIThis integrative approach demonstrates that targeted nuclear assays combined with long-range mitochondrial sequencing can recover individual-level genetic information from aquatic eDNA. By defining analytical thresholds governing success, the framework advances non-invasive genetic monitoring of populations via eDNA and enables population-level monitoring and conservation of endangered and genetically-vulnerable species. C_LI

7

Estimating hierarchical F-statistics from Pool-Seq data

Gautier, M.; Coronado-Zamora, M.; Vitalis, R.

2024-11-22 genetics 10.1101/2024.11.22.624688 medRxiv

Top 0.1%

62.5%

Show abstract

Introduced over seventy years ago, F -statistics have been and remain central to population and evolutionary genetics. Among them, FST is one of the most commonly used descriptive statistics in empirical studies, notably to characterize the structure of genetic polymorphisms within and between populations, to shed light on the evolutionary history of populations, or to identify marker loci under differential selection for adaptive traits. However, the use of FST in simplified population models can overlook important hierarchical structures, such as geographic or temporal subdivisions, potentially leading to misleading interpretations and increasing false positives in genome scans for adaptive differentiation. Hierarchical F -statistics have been introduced to account for multiple predefined levels of population structure. Several estimators have also been proposed, including robust ones implemented in the popular R package hierfstat. Nevertheless, these were primarily designed for individual genotyping data and can be computationally intensive for large genomic datasets. In this study, we extend previous work by developing unbiased method-of-moments estimators for hierarchical F -statistics tailored for Pool-Seq data, a cost-effective alternative to individual genome sequencing. These Pool-Seq estimators have been developed in an anova framework, using definitions based on identity-in-state probabilities. The new estimators have been implemented in an updated version of the R package poolfstat, together with estimators for sample allele count data derived from individual genotyping data. We validate and compare the performance of these estimators through extensive simulations under a hierarchical island model. Finally, we apply these estimators to real Pool-Seq data from Drosophila melanogaster populations, demonstrating their usefulness in revealing population structure and identifying loci with high differentiation within or between groups of subpopulations and associated with spatial or temporal genetic variation.

8

Non-invasive fecal DNA yields whole genome and metagenomic data for species conservation

de Flamingh, A.; Ishida, Y.; Pecnerova, P.; Vilchis, S.; Siegismund, H.; van Aarde, R.; Malhi, R.; Roca, A.

2022-08-17 genomics 10.1101/2022.08.16.504190 medRxiv

Top 0.1%

61.6%

Show abstract

Non-invasive biological samples benefit studies that investigate rare, elusive, endangered, and/or dangerous species. Integrating genomic techniques that use non-invasive biological samples with advances in computational approaches can benefit and inform wildlife conservation and management. Here we present a molecular pipeline that uses non-invasive fecal DNA samples to generate low- to medium-coverage genomes (e.g., >90% of the complete nuclear genome at 6X coverage) and metagenomic sequences, combining in a novel fashion widely available and accessible DNA collection cards with commonly used DNA extraction and library building approaches. DNA preservation cards are easy to transport and can be stored non-refrigerated, avoiding cumbersome and/or costly sample methods. The genomic library construction and shotgun sequencing approach did not require enrichment or targeted DNA amplification. The utility and potential of the data generated by this pipeline was demonstrated by the application of genome-scale analysis and metagenomics to zoo and free-ranging African savanna elephants (Loxodonta africana). Fecal samples collected from free-ranging individuals contained an average of 12.41% (5.54-21.65%) endogenous elephant DNA. Clustering of these elephants with others from the same geographic region was demonstrated by a principal component analysis of genetic variation using nuclear genome-wide SNPs. Metagenomic analyses generated compositional taxon classifications that included Loxodonta, green plants, fungi, arthropods, bacteria, viruses and archaea, showcasing the utility of our approach for addressing complementary questions based on host-associated DNA, e.g., pathogen and parasite identification. The molecular pipeline presented here extends applications beyond what has previously been shown for target-enriched datasets and contributes towards the expansion and application of genomic techniques to conservation science and practice.

9

Portable, multilocus DNA barcoding across the diversity of meiofauna

Keene, D.; Arya, S.; Walker, B.; Laumer, C. E.

2026-05-22 zoology 10.64898/2026.05.20.726206 medRxiv

Top 0.1%

61.5%

Show abstract

Molecular data have revolutionised taxonomic and ecological research on the hyperdiverse communities of aquatic benthic microinvertebrates known as meiofauna. However, reference sequence databases remain highly incomplete, with variable barcode genes or fragments studied from taxon to taxon. Furthermore, there is a typical tradeoff between universality of primers and phylogenetic resolution, with rRNA markers being robustly recoverable but failing to resolve species-level divergences, and mitochondrial markers showing the reverse trend. Here, we introduce Oxford Nanopore rRNA and COI amplicon sequencing (OrCa-seq), a rapid, low-cost protocol for parallel long-range PCR amplification and multiplexed sequencing of four amplicons, spanning the nearly-complete rRNA cistron ([~]7-8 kb) and the widely studied Folmer region of COI (represented as overlapping 313 and 658 bp amplicons). This protocol, with its associated bioinformatic workflow, was designed for conducting biodiversity inventories of meiofauna and can be easily carried out in field research and educational contexts, with data available from 96-well plates of specimens within a day of lysis. To validate the method, we processed six plates of student-isolated freshwater and limno-terrestrial meiofauna, characterising the recovery of target genes and taxa with both automated and human-curated BLAST database comparisons. These data demonstrate the universal applicability of OrCa-seq across effectively all meiofauna, including the very smallest species. Nonetheless, recovery efficiency for each amplicon shows variation by taxon, with the full-length Folmer COI amplicon standing out as the most challenging. We present exemplar phylogenetic trees integrating reference sequences, demonstrating the utility of these data in confirming morphological determinations and in identifying anonymous specimens in a reverse taxonomy context. While developed in a specific educational context for use on meiofauna, the OrCa-seq approach should be readily scalable to larger research datasets, adaptable to many specimen types, and to any combination of taxon-or target-specific primers. As such, it represents a compelling multi-locus extension to the ever-growing repertoire of nanopore DNA barcoding protocols.

10

Optimised in-solution enrichment of over a million ancient human SNPs

Davidson, R.; Roca-Rada, X.; Ravishankar, S.; Taufik, L.; Haarkötter, C.; Collen, E.; Webb, P.; Williams, M. P.; Mahmud, M. I.; Djami, E. N. I.; Purnomo, G. A.; Santos, C.; Malagosa, A.; Manzanilla, L. R.; Silva, A. M.; Tereso, S.; Matos, V.; Carvalho, P. C.; Fernandes, T.; Maurer, A.-F.; Teixeira, J. C.; Tobler, R.; Fehren-Schmitz, L.; Llamas, B.

2024-05-16 genomics 10.1101/2024.05.16.594432 medRxiv

Top 0.1%

60.7%

Show abstract

In-solution hybridisation enrichment of genetic markers is a method of choice in paleogenomic studies, where the DNA of interest is generally heavily fragmented and contaminated with environmental DNA, and where the retrieval of genetic data comparable between individuals is challenging. Here, we benchmarked the commercial "Twist Ancient DNA" reagent from Twist Biosciences using sequencing libraries from ancestrally diverse ancient human samples with low to high endogenous DNA content (0.1-44%). For each library, we tested one and two rounds of enrichment, and assessed performance compared to deep shotgun sequencing. We find that the "Twist Ancient DNA" assay provides robust enrichment of [~]1.2M target SNPs without introducing allelic bias that may interfere with downstream population genetics analyses. Additionally, we show that pooling up to 4 sequencing libraries and performing two rounds of enrichment is both reliable and cost-effective for libraries with less than 27% endogenous DNA content. Above 38% endogenous content, a maximum of one round of enrichment is recommended for cost-effectiveness and to preserve library complexity. In conclusion, we provide researchers in the field of human paleogenomics with a comprehensive understanding of the strengths and limitations of different sequencing and enrichment strategies, and our results offer practical guidance for optimising experimental protocols.

11

A Practical and Cost-Effective Approach to Long-Fragment eDNA Sequencing for High-Resolution Genetic Diversity Assessment

Tsuji, S.; Shibata, N.; Yatsuyanagi, T.; Fuke, Y.

2025-10-11 molecular biology 10.1101/2025.10.11.681776 medRxiv

Top 0.1%

60.5%

Show abstract

Environmental DNA (eDNA) analysis is increasingly recognised as a valuable method for assessing genetic diversity. However, its resolution and applicability are limited by the short length of sequences that can be analysed (typically < 400 bp) and high analytical costs. This study developed a practical, low-cost long-fragment eDNA analysis method using commercial full-length plasmid sequencing via a nanopore platform and evaluated its effectiveness in assessing population genetic structure. 1 L of surface water was collected from 52 sites across Hokkaido, Japan, targeting Barbatula oreas. Two mitochondrial regions (ND5 and cyt b; approximately 1,000 bp each) were species-specifically amplified, circularised, and sequenced. Library preparation took 2.5 hours, with a total cost per sample of 4,390 JPY ({approx}25.55 EUR, {approx}29.87 USD). High-quality reads were obtained from 34 samples, allowing for the reconstruction of multiple haplotypes per region through haplotype phasing. The eDNA concentration required to achieve a 50% sequencing success was within a range easily attainable for common species. Phylogenetic analysis using 62 concatenated haplotypes (1,968 bp) obtained from each sample identified two clades and multiple regional subgroups, providing higher-resolution phylogeographic information than the previous study. Furthermore, the differentiation of each clade and group was suggested to reflect geological and climatic events. These results demonstrate the feasibility and utility of long-fragment eDNA analysis for evaluating genetic diversity, and its broad application is anticipated in ecological research, conservation management, and environmental policy formulation.

12

AmpliPiper: A versatile amplicon-seq analysis tool for multilocus DNA barcoding

Bertelli, A.; Steindl, S.; Kirchner, S.; Schwahofer, P.; Haring, E.; Szucsich, N.; Kruckenhauser, L.; Kapun, M.

2024-12-17 bioinformatics 10.1101/2024.12.11.628038 medRxiv

Top 0.1%

60.3%

Show abstract

The advent of third generation sequencing technology has revolutionized parallelized sequencing of DNA fragments of varying lengths, such as PCR amplicons, which provides unprecedented new opportunities for large-scale and diverse DNA barcoding projects that, for example, aim to quantify the accelerating biodiversity crisis. However, the broad-scale application of these new technologies for biodiversity research is often hindered by the demand for advanced bioinformatics skills to carry out quantitative analyses. To facilitate the application of multilocus amplicon sequencing (amplicon-seq) data for biodiversity and integrative taxonomic research questions, we present AmpliPiper, an automated and user-friendly software pipeline which carries out bioinformatics analyses of multilocus amplicon-seq data generated with Oxford Nanopore (ONT) sequencing. AmpliPiper combines analysis methods for DNA barcoding data that include demultiplexing of pooled amplicon-seq data, haplotype-specific consensus sequence reconstruction, species identification based on comparison to the BOLD and GenBank databases, phylogenetic analyses and species delimitation. We demonstrate the applicability and workflow of our approach based on a newly generated dataset of 14 hoverfly (Syrphidae) samples that were amplified and sequenced at four marker genes. We further benchmark our approach with Sanger sequencing and simulated amplicon-seq data which show that DNA barcoding with ONT is both accurate and sensitive to detect even subtle genetic variation.

13

TRIDENT (Taxonomic Resolution and IDentification using Environmental dNa Traces): An Optimized Algorithm for Vertebrate Taxonomic Assignments in eDNA Metabarcoding, Integrating Molecular, Taxonomic, and Ecological Criteria

Haderle, R.; Jung, G.; Riou, M.; Ung, V.; Jung, J.-L.

2026-07-09 molecular biology 10.64898/2026.06.29.735257 medRxiv

Top 0.1%

59.6%

Show abstract

Environmental DNA (eDNA) metabarcoding has become a powerful approach for large-scale biodiversity assessment, yet taxonomic assignment remains one of its most critical error-prone steps. Current bioinformatic pipelines rely on molecular similarity searches against reference databases, but assignment accuracy is constrained not only by short marker length and database incompleteness, but also by fundamental limitations, including recent species radiations, incomplete lineage sorting, introgression, NUMTs, and the imperfect correspondence between genetic variation and species boundaries. Here, we present TRIDENT (Taxonomic Resolution and IDentification using Environmental dNa Traces), an automated and simple protocol designed to improve taxonomic assignments in eDNA metabarcoding. Initially developed for marine vertebrates, TRIDENT may be used with any barcode and integrates three complementary sources of evidence: molecular similarity (NCBI/GenBank and BOLD), curated taxonomic information (WoRMS), and ecological plausibility derived from biogeographic occurrence data (GBIF). The workflow sequentially constructs candidate taxon lists based on sequence similarity, expands them through taxonomic hierarchies, and filters them using spatial occurrence constraints. It further identifies possible taxa lacking reference barcodes and evaluates their plausibility through CO1-based similarity if data exist in BOLD. TRIDENT has been implemented as a source-available Python tool and tested using empirical eDNA datasets from marine vertebrates as well as simulated communities. Results demonstrate that the tool produces taxonomic assignments consistent with expert manual curation while substantially reducing processing time and attention errors caused by manual processing of large datasets. By combining molecular, taxonomic, and ecological criteria within a single framework, TRIDENT improves transparency and reproducibility and provides a robust and flexible solution strengthening confidence in taxonomic identifications in eDNA-based biodiversity assessments.

14

ngsAMOVA: A Probabilistic Framework for Analysis of Molecular Variance, dXY and Neighbor-Joining Trees with Low Depth Sequencing Data

Altinkaya, I.; Zhao, L.; Nielsen, R.; Korneliussen, T. S.

2025-05-15 bioinformatics 10.1101/2025.05.12.653431 medRxiv

Top 0.1%

59.5%

Show abstract

MotivationNext-generation sequencing (NGS) has transformed population genetics and evolutionary biology, but the data produced in studies of non-model organisms, ancient DNA, and environmental DNA often consist of low- or medium-depth sequencing. Analyses of these data rely on computational methods that utilize genotype likelihoods (GLs) to account for genotype uncertainty. Nevertheless, many widely-used analysis methods, such as analysis of molecular variance (AMOVA) and methods for estimating phylogenetic trees using nucleotide divergence (dXY) still lack the probabilistic frameworks necessary to accommodate GLs. ResultsWe introduce ngsAMOVA, a novel probabilistic framework for analyzing molecular variation in population hierarchies with low- and medium-depth sequencing data. It employs an Expectation Maximization algorithm to first estimate the joint genotype probabilities for pairs of individuals, accounting for genotype uncertainty using GLs. It then uses these estimates to generate a pairwise distance matrix, which can be used for AMOVA, estimation of dXY, and for estimating phylogenetic trees using Neighbor-Joining. Hypothesis testing is facilitated using genomic block-bootstrapping. Through extensive simulations, we demonstrate that ngsAMOVA provides more accurate results compared to genotype calling at low and medium read depths. Overall, ngsAMOVA represents a methodological advance in the analysis of molecular variance and divergence under sequencing uncertainty. It provides a robust framework, opening up numerous possibilities for gaining insights into the evolutionary histories through its applications. ngsAMOVA is available as a fast, efficient, and user-friendly program written in C/C++. AvailabilityngsAMOVA is freely available at https://github.com/isinaltinkaya/ngsAMOVA. Contactisin.altinkaya@sund.ku.dk Supplementary informationSupplementary data are available online.

15

Genotype likelihoods incorporated in non-linear dimensionality reduction techniques infer fine-scale population genetic structure

Cilingir, F. G.; Uzel, K.; Grossen, C.

2024-04-01 bioinformatics 10.1101/2024.04.01.587545 medRxiv

Top 0.1%

54.6%

Show abstract

O_LIUnderstanding population structure is essential for conservation genetics, as it provides insights into population connectivity and supports the development of targeted strategies to preserve genetic diversity and adaptability. C_LIO_LIT-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) have proven effective for revealing population genetic structures in human and model organisms using hard-called genotypes, but their application in wild species using genotype likelihoods from low coverage sequencing (as a cost-saving measure) remains underexplored. C_LIO_LIHere, we present a Jupyter Notebook-based workflow that facilitates the use of UMAP and t-SNE on genotype likelihood-derived principal components. C_LIO_LIThis workflow is demonstrated using medium to low-coverage whole-genome sequencing data from scimitar-horned oryx, which has been reintroduced into the wild and faces multiple conservation challenges. C_LIO_LIDetailed guidance on hyperparameter tuning and practical implementation is also provided, enhancing the application of these methods in wildlife genetics to potentially support biodiversity conservation. C_LI

16

Towards population genetic assessments and species abundance from environmental DNA: A case study with zebrafish in controlled aquaria

Meenakshisundaram, A.; Jarman, S.; Power, H.; Kennington, J. W.; Thomas, L.

2025-02-20 ecology 10.1101/2025.02.15.638407 medRxiv

Top 0.1%

54.2%

Show abstract

Developing robust methods for amplifying and analysing highly-polymorphic nuclear genetic markers from environmental samples could assist in the reliable and scalable long-term monitoring of elusive, threatened or invasive species that are otherwise challenging to observe. In this study, we used zebrafish in controlled aquaria to apply forensic science approaches and demonstrate that microhaplotypes, which are short segments of nuclear DNA (100-300bp) containing two or more single nucleotide polymorphisms (SNPs), can be amplified from trace DNA in water samples to accurately estimate population genetic diversity and species abundance. We successfully amplified a panel of 17 microhaplotypes that comprised 69 SNPs which could reliably estimate population-level allele frequencies and genetic diversity estimates from water DNA. The panel of microhaplotypes amplified from water samples from replicate tanks strongly matched allele frequency estimates from corresponding tissue samples, and could also be used for estimating number of contributors from multi-individual samples. Our research demonstrates the effectiveness and potential of amplifying microhaplotype panels from eDNA as a non-invasive and scalable tool for population genetic studies of aquatic species.

17

Let the prey speak: Using PNA clamps to silence predator DNA in marine faecal diet studies

Polanowski, A. M.; Suter, L.; Deagle, B. E.; McInnes, J. C.

2026-07-08 molecular biology 10.64898/2026.06.22.733645 medRxiv

Top 0.1%

54.1%

Show abstract

DNA metabarcoding of faeces is a powerful, non-invasive method for assessing predator diets. However, when studying the diet of generalist predators, broad PCR primers are used to amplify the wide range of potential prey species and metabarcoding outputs are often dominated by sequences from the predator. While blocking primers can be used to reduce PCR amplification of predator DNA, they frequently cause partial predator suppression and unintended prey blocking. Peptide nucleic acid (PNA) clamps, offer a promising, underutilised alternative by binding strongly and selectively to predator DNA to block its PCR amplification. In this study we designed and validated a novel PNA clamp targeting the 18S rRNA gene to suppress bird and mammal predator DNA in dietary samples. We tested this clamp on tissue mixtures and faecal samples from three seabird and two seal species across temperate, subantarctic, and Antarctic regions. The PNA clamp substantially increased the proportion of prey reads recovered while maintaining consistent prey community composition across all predator species. Our results demonstrate not only the general effectiveness of PNA clamps over standard blocking primers, but also provide a powerful, broadly applicable new tool to improve the accuracy in DNA diet metabarcoding studies.

18

An evaluation of pool-sequencing transcriptome-based exon capture for population genomics in non-model species

Deleury, E.; Guillemaud, T.; Blin, A.; Lombaert, E.

2019-09-23 genomics 10.1101/583534 medRxiv

Top 0.1%

54.0%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWExon capture coupled to high-throughput sequencing constitutes a cost-effective technical solution for addressing specific questions in evolutionary biology by focusing on expressed regions of the genome preferentially targeted by selection. Transcriptome-based capture, a process that can be used to capture the exons of non-model species, is use in phylogenomics. However, its use in population genomics remains rare due to the high costs of sequencing large numbers of indexed individuals across multiple populations. We evaluated the feasibility of combining transcriptome-based capture and the pooling of tissues from numerous individuals for DNA extraction as a cost-effective, generic and robust approach to estimating the variant allele frequencies of any species at the population level. We designed capture probes for [~]5 Mb of chosen de novo transcripts from the Asian ladybird Harmonia axyridis (5,717 transcripts). We called [~]300,000 bi-allelic SNPs for a pool of 36 non-indexed individuals. Capture efficiency was high, and pool-seq was as effective and accurate as individual-seq for detecting variants and estimating allele frequencies. Finally, we also evaluated an approach for simplifying bioinformatic analyses by mapping genomic reads directly to targeted transcript sequences to obtain coding variants. This approach is effective and does not affect the estimation of SNP allele frequencies, except for a small bias close to some exon ends. We demonstrate that this approach can also be used to predict the intron-exon boundaries of targeted de novo transcripts, making it possible to abolish genotyping biases near exon ends.

19

A fast and inexpensive plate-based NGS library preparation method for insect genomics

Cobb, L.; de Muinck, E. J.; Kollias, S.; Skage, M.; Gilfillan, G.; Qiao, S.-W.; Sydenham, M.; Star, B.

2023-11-25 genomics 10.1101/2023.11.24.568434 medRxiv

Top 0.1%

52.1%

Show abstract

Entomological sampling and storage conditions often prioritise efficiency, practicality and conservation of morphological characteristics, and may therefore be suboptimal for DNA preservation. This practice can impact downstream molecular applications, such as the generation of high-throughput genomic libraries, which often requires substantial DNA input amounts. Here, we investigate a fast and economical Tn5 transposase tagmentation-based library preparation method optimised for 96-well plates and low yield DNA extracts from insect legs stored under different conditions. Using a standardised input of 6ng DNA, library preparation costs were significantly reduced through the 6-fold dilution of a commercially available tagmentation enzyme. Costs were further suppressed by direct post-amplification pooling, skipping quality assessment of individual libraries. We find that reduced DNA yields associated with ethanol-based storage do not impede overall sequencing success. Furthermore, we find that the efficiency of tagmentation-based library preparation can be improved by thorough post-amplification bead clean-up which selects against both short and large DNA fragments. By lowering data generation costs, broadening the scope of whole genome studies to include low yield DNA extracts and increasing throughput, we expect this protocol to be of significant value for a range of applications in the field of insect genomics.

20

A Novel eDNA-Based Approach for Hybrid Detection: Implications for Conservation Management

Sakata, M. K.; Yano, N.; Imamura, A.; Yamanaka, H.; Minamoto, T.

2026-03-27 ecology 10.64898/2026.03.26.714632 medRxiv

Top 0.1%

52.0%

Show abstract

Hybridization between invasive and native species poses a hidden but critical threat to biodiversity. While environmental DNA (eDNA) has revolutionized species monitoring, it has lacked the resolution to detect hybrid individuals. Here, we present the first experimental demonstration of hybrid identification using eDNA. Our method isolates a single cell in the environment (hereafter, eCell) and enables cellular-level analysis using multiplex digital PCR targeting nuclear markers from both parental species. Validation with controlled tank experiments using Oncorhynchus masou masou x Salvelinus leucomaenis leucomaenis hybrid individuals confirmed the methods ability to separately detect hybrid individuals from co-habiting purebred parent individuals. This eCell analysis overcomes the limitations of traditional eDNA methods and offers a scalable, non-invasive tool for detecting cryptic hybridization. By enabling early and accurate detection of hybrid individuals, it supports timely conservation decisions, including management prioritization and the protection of purebred populations. This novel technique bridges a critical gap in conservation genetics and enhances eDNAs utility for biodiversity management in the face of global change.